1 research outputs found
Leveraging knowledge graphs to update scientific word embeddings using latent semantic imputation
The most interesting words in scientific texts will often be novel or rare.
This presents a challenge for scientific word embedding models to determine
quality embedding vectors for useful terms that are infrequent or newly
emerging. We demonstrate how \gls{lsi} can address this problem by imputing
embeddings for domain-specific words from up-to-date knowledge graphs while
otherwise preserving the original word embedding model. We use the MeSH
knowledge graph to impute embedding vectors for biomedical terminology without
retraining and evaluate the resulting embedding model on a domain-specific
word-pair similarity task. We show that LSI can produce reliable embedding
vectors for rare and OOV terms in the biomedical domain.Comment: Accepted for the Workshop on Information Extraction from Scientific
Publications at AACL-IJCNLP 202